In analyzing the temporal dynamics of the U.S. freight transportation industry, key variables emerge as critical factors shaping its trajectory. Economic indicators, encompassing metrics like GDP growth, consumer spending, and industrial production, serve as pivotal markers of overall economic activity. These indicators not only influence freight transportation demand but also offer insights into broader economic trends. Additionally, fuel prices wield significant influence, affecting operating costs and modal choice decisions within the transportation sector. Research underscores the impact of fuel price fluctuations on transportation costs, suggesting their inclusion as exogenous variables in ARIMAX models to enhance freight volume forecasts.
When considering modeling approaches, both ARIMAX and VAR models offer distinct advantages for capturing the complexities of freight transportation dynamics. ARIMAX models are well-suited for incorporating economic indicators, fuel prices, and infrastructure investment as variables. Leveraging their autoregressive and exogenous components, ARIMAX models can effectively account for temporal patterns and external influences, thereby improving the accuracy of freight volume forecasts over time. On the other hand, VAR models excel in analyzing the dynamic relationships among multiple variables. They provide a versatile framework for examining the simultaneous effects of economic indicators, fuel prices, regulatory changes, and technological advancements on freight transportation dynamics. With their capacity to capture intricate interactions and feedback mechanisms, VAR models offer valuable insights into the evolving landscape of the freight transportation industry.
In my ARIMAX/SARIMAX model, I’ve chosen to focus on two key datasets. Firstly, the COVID-19 pandemic has reshaped societal norms significantly, leading to notable shifts in consumer behaviors and lifestyle patterns. With restrictions in place and mobility constrained, there has been a substantial surge in demand for delivery services. UPS, as a prominent player in the U.S. delivery service sector, has been instrumental in meeting this increased demand. My goal is to examine whether there exists a relationship between the number of new COVID-19 cases and UPS stock prices. To accomplish this, I will employ an ARIMAX model to delve into the dynamics between these variables and discern any potential impact of the pandemic on UPS stock performance.
Secondly, fuel prices play a pivotal role in the freight industry, exerting a significant influence on operational costs. I aim to investigate whether fluctuations in fuel prices affect freight values, particularly within the international freight sector. Additionally, I aim to assess the potential impact of the Transportation Services Index (TSI) on freight values. To explore these relationships comprehensively, I intend to utilize the SARIMAX model. This model facilitates the analysis of time series data while accounting for external factors such as fuel prices and the TSI. Through this analysis, I seek to gain insights into the interplay between fuel prices, the TSI, and the values of Canadian freight in the U.S. landscape.
For the VAR model, the analysis delves into three distinct sets of variables within the air cargo industry to uncover their interrelationships and implications. Firstly, it examines the connections between Air Cargo (Domestic), Air Cargo (International), Fuel Price, and the Transportation Services Index (TSI). This investigation aims to discern whether fluctuations in air cargo volumes, both domestically and internationally, correlate with changes in fuel prices and TSI, thereby providing insights into the broader drivers of the air cargo industry.
Secondly, the analysis focuses on the relationship between Air Cargo (Domestic), Air Cargo (International), and Employment. By exploring the mutual influence between air cargo volumes and employment levels. Understanding these dynamics is crucial for grasping the industry’s labor market dynamics and its broader economic implications.
Lastly, the analysis considers the dynamics between GDP, Revenue, and Employment within the air industry. By examining the interplay between these key economic indicators, it aims to uncover the industry’s economic performance and its effects on employment trends. This exploration provides valuable insights into the industry’s contribution to the broader economy and its role in shaping employment patterns.
Define Models
SARIMAX: UPS Stock Price ~ Covid New Cases (Weekly)
Code
# SARIMAX: ups stock price and covid new cases (weekly)# get the weekly stock price of upsups <-getSymbols("UPS",auto.assign =FALSE, from ="2020-01-01", to ="2023-05-15",src="yahoo") ups=data.frame(ups)ups <-data.frame(ups,rownames(ups))colnames(ups)[7] ="date"ups$date<-as.Date(ups$date,"%Y-%m-%d")ups <- ups[seq(1, nrow(ups), by =5), ]# get the weekly covid new casescovid <-read.csv("data/clean_data/covid.csv",header =TRUE, sep =',')covid <-subset(covid, select =c(date, new_cases_per_million))covid <-na.omit(covid)covid <-head(covid, -6)covid$new_cases_per_million[covid$new_cases_per_million ==0.000] <-0.001# combine the dataups_covid <-data.frame("UPS"=log(ups$UPS.Adjusted), "COVID"=log(covid$new_cases_per_million))#(head(ups_covid))# Convert to a Time Series Component ups_covid_ts <-ts(ups_covid, start =decimal_date(as.Date("2020-01-02", format ="%Y-%m-%d")), frequency =52)head(ups_covid_ts)
SARIMAX: Canada Freight Value ~ Fuel Price + TSI (Monthly)
Code
# SARIMAX: Canada Freight Value and Fuel Price, TSI (monthly)# get the canada freight monthly valuecanada <-read.csv("data/clean_data/24monthly_canada_freight.csv")canada <- canada[,c("Date","Value")]# get the fuel monthly pricefuel <-read.csv("data/clean_data/34monthly_fuel_prices.csv")# remove the dollar signfuel$Diesel <-as.numeric(gsub("\\$", "", fuel$Diesel))fuel$Jet.Fuel <-as.numeric(gsub("\\$", "", fuel$Jet.Fuel))# select the same start datafuel1 <- fuel[73:(nrow(fuel)-6),]# get the TSI monthly datatsi <-read.csv("data/clean_data/35monthly_TSI.csv")tsi1 <- tsi[73:nrow(tsi),]# combine the datacanada_value <-data.frame("Canada_value"=canada$Value, "Fuel"=fuel1$Diesel, "TSI"=tsi1$Transportation.Services.Index...Freight)#(head(canada_value))canada_value_ts <-ts(canada_value, start =decimal_date(as.Date("2006-01-01", format ="%Y-%m-%d")), frequency =12)head(canada_value_ts)
Canada_value Fuel TSI
Jan 2006 43.47682 246.7 112.7
Feb 2006 42.49097 247.5 111.8
Mar 2006 47.70424 255.9 112.3
Apr 2006 44.39409 272.8 111.5
May 2006 46.93539 289.7 113.8
Jun 2006 46.93349 289.8 113.2
VAR: Air Cargo(Domestic) ~ Air Cargo(International) + Fuel Price + TSI (Monthly)
Code
# VAR: Air Cargo and Fuel Price, TSI (monthly)# get the cargo datacargo <-read.csv("data/clean_data/Monthly_air_revenue.csv")cargo$International <- cargo$U.S..Air.Carrier.Cargo..millions.of.revenue.ton.miles....International/1000000000cargo$Domestic <- cargo$U.S..Air.Carrier.Cargo..millions.of.revenue.ton.miles....Domestic/1000000000cargo <- cargo[4:(nrow(cargo)-8),]# get the fuel monthly pricefuel <-read.csv("data/clean_data/34monthly_fuel_prices.csv")# remove the dollar signfuel$Diesel <-as.numeric(gsub("\\$", "", fuel$Diesel))fuel$Jet.Fuel <-as.numeric(gsub("\\$", "", fuel$Jet.Fuel))# select the same start datafuel2 <- fuel[37:(nrow(fuel)-6),]# get the TSI monthly datatsi <-read.csv("data/clean_data/35monthly_TSI.csv")tsi2 <- tsi[37:nrow(tsi),]# combine the datarevenue_ <-data.frame("Domestic"=cargo$Domestic, "International"=cargo$International, "Fuel"=fuel2$Jet.Fuel, "TSI"=tsi2$Transportation.Services.Index...Freight)#(head(revenue_))revenue_ts <-ts(revenue_, start =decimal_date(as.Date("2003-01-01", format ="%Y-%m-%d")), frequency =12)head(revenue_ts)
Domestic International Fuel TSI
Jan 2003 1.177785 1.395876 88.7 104.2
Feb 2003 1.079561 1.366617 105.5 103.4
Mar 2003 1.233953 1.691870 89.3 103.9
Apr 2003 1.208202 1.524837 74.3 104.0
May 2003 1.243021 1.489449 71.4 103.0
Jun 2003 1.207785 1.473214 74.8 102.9
VAR: Employment ~ Air Cargo(Domestic) + Air Cargo(International) (Monthly)
Code
# VAR: Employment and Air Cargo (monthly)# get the employment dataemployment <-read.csv("data/clean_data/32monthly_employment.csv")employment <- employment[,c("Date","Transportation.Employment...Air.Transportation")]# get the cargo datacargo <-read.csv("data/clean_data/Monthly_air_revenue.csv")cargo$International <- cargo$U.S..Air.Carrier.Cargo..millions.of.revenue.ton.miles....International/1000000000cargo$Domestic <- cargo$U.S..Air.Carrier.Cargo..millions.of.revenue.ton.miles....Domestic/1000000000cargo1 <- cargo[28:(nrow(cargo)-8),]# combine the dataemp_ <-data.frame("Employment"=employment$Transportation.Employment...Air.Transportation, "International"=cargo1$International, "Domestic"=cargo1$Domestic)#(head(emp_))emp_ts <-ts(emp_, start =decimal_date(as.Date("2005-01-01", format ="%Y-%m-%d")), frequency =12)head(emp_ts)
Employment International Domestic
Jan 2005 505200 1.696944 1.280516
Feb 2005 503400 1.561959 1.253130
Mar 2005 504200 1.958162 1.469765
Apr 2005 507300 2.078222 1.226824
May 2005 507800 1.925489 1.261165
Jun 2005 508300 1.981247 1.351714
VAR: GDP ~ Revenue + Employment (Air, Yearly)
Code
# VAR: Air GDP, Revenue and Employment (yearly)# get the yearly GDP of airgdp <-read.csv("data/clean_data/31GDP.csv")# Group by Year and Mode, calculate the sum of GDP within each groupgdp <- gdp %>%filter(Mode =='Air')%>%group_by(Year) %>%summarize(Value =sum(Current....billions., na.rm =TRUE))# get the air revenueair <-read.csv("data/clean_data/33revenue.csv")air <- air[air$Mode=="Air carrier, domestic",c("Year","Value")]air <- air[order(air$Year), ]air <- air[13:nrow(air),]# get the air employmentemployment <-read.csv("data/clean_data/32monthly_employment.csv")employment <- employment[,c("Date","Transportation.Employment...Air.Transportation")]employment$Date <-as.POSIXct(employment$Date, format ="%m/%d/%Y %I:%M:%S %p")employment$Year <-format(employment$Date, "%Y")employment <- employment %>%group_by(Year) %>%summarize(Value =sum(Transportation.Employment...Air.Transportation, na.rm =TRUE)/1000000)employment <- employment[8:17,]# combine the dataair_ <-data.frame("GDP"=gdp$Value, "Revenue"=air$Value, "Employment"=employment$Value)#(head(air_))air_ts <-ts(air_, start =decimal_date(as.Date("2012-01-01", format ="%Y-%m-%d")), frequency =1)(air_ts)
#ups_covid_tsautoplot(ups_covid_ts, facets = T) +labs(x="Date", y="", title ="UPS Stock Price and Covid New Cases")
The COVID-19 pandemic has been a profound influence on society, dramatically altering people’s lifestyles and behaviors. With restrictions in place and mobility limited, there has been a significant surge in demand for delivery services. UPS, as a leading company in the delivery service sector in the U.S., stands at the forefront of meeting this increased demand. I aim to investigate whether there is a relationship between the number of new COVID-19 cases and UPS stock prices. To do this, I will utilize an ARIMAX model to explore the dynamics between these variables and discern any potential impact of the pandemic on UPS stock performance.
Code
#canada_value_tsautoplot(canada_value_ts, facets = T) +labs(x="Date", y="", title ="Canada Freight Value and Fuel Price and TSI")
Fuel prices play a crucial role in the freight industry, significantly impacting operational costs. I aim to investigate whether fluctuations in fuel prices influence freight values, particularly in the international freight sector. Additionally, I seek to understand the potential impact of the Transportation Services Index (TSI) on freight values. To explore these relationships, I plan to utilize the SARIMAX model, which allows for the analysis of time series data while considering exogenous variables such as fuel prices and the TSI. By examining these dynamics, I hope to gain insights into the interplay between fuel prices, the TSI, and the Canada freight values in the U.S..
Series: ups_covid_ts[, "UPS"]
Regression with ARIMA(0,1,0)(1,0,0)[52] errors
Coefficients:
sar1 xreg
-0.0795 -0.0008
s.e. 0.0930 0.0068
sigma^2 = 0.001926: log likelihood = 289.38
AIC=-572.75 AICc=-572.61 BIC=-563.36
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set 0.003119728 0.04349475 0.03097627 0.06025101 0.6193865 0.1143339
ACF1
Training set 0.01248485
Code
checkresiduals(fit0)
Ljung-Box test
data: Residuals from Regression with ARIMA(0,1,0)(1,0,0)[52] errors
Q* = 32.331, df = 33, p-value = 0.5002
Model df: 1. Total lags used: 34
Select model: Regression with ARIMA(0,1,0)(1,0,0)[52] errors.
Series: canada_value_ts[, "Canada_value"]
Regression with ARIMA(1,0,0)(1,0,0)[12] errors
Coefficients:
ar1 sar1 intercept fuel tsi
0.6281 0.6640 -23.8772 0.0581 0.4504
s.e. 0.0581 0.0602 8.0465 0.0059 0.0726
sigma^2 = 5.295: log likelihood = -462.94
AIC=937.88 AICc=938.3 BIC=957.81
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set -0.05212728 2.272867 1.762864 -0.2909279 3.694124 0.30965
ACF1
Training set -0.0597277
Code
checkresiduals(fit0)
Ljung-Box test
data: Residuals from Regression with ARIMA(1,0,0)(1,0,0)[12] errors
Q* = 42.707, df = 22, p-value = 0.005127
Model df: 2. Total lags used: 24
Select model: Regression with ARIMA(1,0,0)(1,0,0)[12] errors.
iter 44 value -3.036392
iter 45 value -3.036392
iter 46 value -3.036392
iter 47 value -3.036393
iter 48 value -3.036393
iter 48 value -3.036393
iter 48 value -3.036393
final value -3.036393
converged
<><><><><><><><><><><><><><>
Coefficients:
Estimate SE t.value p.value
ar1 1.0032 0.1123 8.9331 0.0000
ar2 0.7658 0.1589 4.8208 0.0000
ar3 -0.7919 0.0985 -8.0412 0.0000
ma1 0.0095 0.1403 0.0675 0.9463
ma2 -0.6163 0.1250 -4.9310 0.0000
xmean 0.0177 0.0618 0.2867 0.7747
sigma^2 estimated as 0.002259191 on 164 degrees of freedom
Coefficients:
Estimate SE t.value p.value
sar1 -0.1573 0.0945 -1.6643 0.0979
constant -0.0008 0.0035 -0.2249 0.8223
sigma^2 estimated as 0.00250965 on 167 degrees of freedom
AIC = -3.106522 AICc = -3.106094 BIC = -3.050962
The Standard Residual Plot both appears to have some irregularities in the mean and variation.
The Autocorrelation Function (ACF) plot model1 shows almost no correlation indicating that the model has harnessed everything and all that is left is white noise. This indicates a good model fit. model2 shows some correlations left.
The Quantile-Quantile (Q-Q) Plot both shows near normality.
The Ljung-Box test results model1 reveal values above the 0.05 (5% significance) threshold, indicating a good fit.
$ttable: All coefficients are significant below a 0.05 threshold except one in model1.
When comparing the two model diagnostics, the manual model is better, which was Regression with ARIMA(3,0,2)(0,0,0)[52] errors.
Coefficients:
Estimate SE t.value p.value
ar1 0.5987 0.0575 10.4168 0.0000
sar1 0.5922 0.0594 9.9712 0.0000
xmean 0.0671 0.9137 0.0734 0.9415
sigma^2 estimated as 5.427344 on 202 degrees of freedom
AIC = 4.595809 AICc = 4.596392 BIC = 4.660648
The Standard Residual Plot both appears to have some irregularities in the mean and variation.
The Autocorrelation Function (ACF) plot model1 shows almost no correlation indicating that the model has harnessed everything and all that is left is white noise. This indicates a good model fit. model2 shows some correlations left.
The Quantile-Quantile (Q-Q) Plot both shows near normality.
The Ljung-Box test results model1 reveal values above the 0.05 (5% significance) threshold, indicating a good fit.
$ttable: All coefficients are significant below a 0.05 threshold.
When comparing the two model diagnostics, the manual model is better, which was Regression with ARIMA(1,0,0)(2,1,1)[12] errors.
Series: ups_covid_ts[, "UPS"]
Regression with ARIMA(0,1,0)(1,0,0)[52] errors
Coefficients:
sar1 xreg
-0.0795 -0.0008
s.e. 0.0930 0.0068
sigma^2 = 0.001926: log likelihood = 289.38
AIC=-572.75 AICc=-572.61 BIC=-563.36
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set 0.003119728 0.04349475 0.03097627 0.06025101 0.6193865 0.1143339
ACF1
Training set 0.01248485
GARCH model on SARIMAX residuals
Code
ggAcf((residuals(fit))^2)
Code
ggPacf((residuals(fit))^2)
Code
model <-list() ## set countercc <-1for (p in1:6) {for (q in1:6) {model[[cc]] <-garch(residuals(fit),order=c(q,p),trace=F)cc <- cc +1}} ## get AIC values for model evaluationGARCH_AIC <-sapply(model, AIC) ## model with lowest AIC is the bestwhich(GARCH_AIC ==min(GARCH_AIC))
Title:
GARCH Modelling
Call:
garchFit(formula = ~garch(1, 1), data = residuals(fit), trace = F)
Mean and Variance Equation:
data ~ garch(1, 1)
<environment: 0x131a9c630>
[data = residuals(fit)]
Conditional Distribution:
norm
Coefficient(s):
mu omega alpha1 beta1
0.00311950 0.00079769 0.00000001 0.57805135
Std. Errors:
based on Hessian
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
mu 3.120e-03 3.327e-03 0.938 0.348
omega 7.977e-04 1.550e-04 5.146 2.67e-07 ***
alpha1 1.000e-08 NaN NaN NaN
beta1 5.781e-01 NaN NaN NaN
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Log Likelihood:
292.1893 normalized: 1.71876
Description:
Fri Apr 26 00:06:29 2024 by user:
Standardised Residuals Tests:
Statistic p-Value
Jarque-Bera Test R Chi^2 79.2303887 0.000000e+00
Shapiro-Wilk Test R W 0.9470052 5.500240e-06
Ljung-Box Test R Q(10) 4.6081684 9.157711e-01
Ljung-Box Test R Q(15) 10.5752144 7.820846e-01
Ljung-Box Test R Q(20) 18.0518484 5.839926e-01
Ljung-Box Test R^2 Q(10) 3.7895993 9.563397e-01
Ljung-Box Test R^2 Q(15) 6.1393552 9.773106e-01
Ljung-Box Test R^2 Q(20) 7.6212699 9.940916e-01
LM Arch Test R TR^2 5.7006263 9.304150e-01
Information Criterion Statistics:
AIC BIC SIC HQIC
-3.390462 -3.316678 -3.391536 -3.360521
Alpha and beta are not significant, so SARIMAX model is enough.
#ups_covid_tscovid_fit <-auto.arima(ups_covid_ts[,'COVID'])fcov <-forecast(covid_fit, 52)fcast <-forecast(fit, xreg=fcov$mean,52) autoplot(fcast, main="Forecast of UPS Stock Price for the next year") +xlab("Year") +ylab("UPS Stock Price")
The forecast of the UPS stock price appears to be relatively flat, indicating that the model may not adequately capture the fluctuations observed in the original data.
Code
#canada_value_tsfuel_fit <-auto.arima(canada_value_ts[,'Fuel'])ffuel <-forecast(fuel_fit, 36)tsi_fit <-auto.arima(canada_value_ts[,'TSI'])ftsi <-forecast(tsi_fit, 36)fxreg <-cbind(fuel = ffuel$mean, tsi = ftsi$mean) fcast <-forecast(fit0, xreg=fxreg,36) autoplot(fcast, main="Forecast of Canada Freight Value for the next three years") +xlab("Year") +ylab("Canada Freight Value")
The trend of the Canada freight value is observed to be decreasing, and the model appears to capture the seasonal fluctuations present in the original data.
Findings
For the UPS stock price and COVID-19 new cases, there appears to be a correlation between the two variables, with the UPS stock price showing an increase alongside the rise in COVID-19 new cases. However, since the COVID-19 pandemic ended in 2023, the UPS stock price is predicted to remain relatively flat with a slight decrease. However, the model may not adequately capture the fluctuations observed in the original data.
Regarding the Canada freight value, both the fuel price and TSI exhibit correlations with the freight value, showcasing seasonal fluctuations. The trend of the Canada freight value over the next three years is expected to decrease. The model appears to capture the seasonal fluctuations present in the original data.
#revenue_tsautoplot(revenue_ts, facets=TRUE) +xlab("Year") +ylab("") +ggtitle("Air Cargo(Domestic), Air Cargo(International), Fuel Price and TSI in USA")
I have selected Air Cargo (Domestic), Air Cargo (International), Fuel Price, and TSI as variables for analysis. My objective is to explore the interdependencies among these variables. Specifically, I aim to understand whether there is a mutual influence between domestic and international air cargo, whether fuel prices affect air cargo volumes, and whether the TSI correlates with the other variables. By examining the relationships among these variables, I hope to gain insights into the dynamics of the air cargo industry and its drivers.
Code
#emp_tsautoplot(emp_ts, facets=TRUE) +xlab("Year") +ylab("") +ggtitle("Employment, Air Cargo(Domestic) and Air Cargo(International) in USA")
I have selected Air Cargo (Domestic), Air Cargo (International), and Employment as variables for analysis. My goal is to examine the potential mutual influence among these variables. Specifically, I aim to understand whether changes in domestic and international air cargo volumes impact employment levels, and vice versa. This analysis will help me understand the interconnectedness of these factors and their implications for economic activity and employment trends.
Code
#air_tsautoplot(air_ts, facets=TRUE) +xlab("Year") +ylab("") +ggtitle("Air GDP, Revenue and Employment in USA")
My objective is to explore the potential interrelationships among these variables. Specifically, I aim to understand how changes in GDP and Revenue within the air industry may influence employment levels, and vice versa. By investigating the relationships among these key indicators, I seek to gain insights into the dynamics of the air industry’s economic performance and its impact on employment trends.
Based on the selection criterion, models with lag orders p=3 and p=10 were considered. Upon fitting VAR(1), VAR(3), and VAR(10) models and comparing the results, it was observed that the VAR(10) model will be instability. Meanwhile, the VAR(3) model showed stronger correlations among the variables, indicating its potential as the best-fitting model.
Based on the selection criterion, models with lag orders p=3, p=4 and p=10 were considered. Upon fitting VAR(1), VAR(3), and VAR(4) models and comparing the results, it was observed that the VAR(10) model will be instability. Meanwhile, the VAR(3) model showed stronger correlations among the variables, indicating its potential as the best-fitting model.
Based on the selection criterion, models with lag orders p=1, and p=2 were considered. Upon fitting VAR(1), and VAR(2) models and comparing the results, it was observed that the model showed no stronger correlations among the variables, indicating VAR(1) as the best-fitting model.
#revenue_tsn=240k=48#12*4#n-k=192; 192/12=16;rmse1 <-matrix(NA, 192,4) #here is n-krmse2 <-matrix(NA, 192,4)rmse3 <-matrix(NA,16,12) #here n-k / 12 year<-c()ts_obj1 <-head(revenue_ts, -1)st <-tsp(ts_obj1 )[1]+(k-1)/12for(i in1:16) #here is (n-k ) / 12{ xtrain <-window(ts_obj1, end=st + i-1) xtest <-window(ts_obj1, start=st + (i-1) +1/12, end=st + i)######## first Model ############ fit <- vars::VAR(xtrain, p=4, type='both') fcast <-predict(fit, newdata=xtest, n.ahead =12) fdo<-fcast$fcst$Domestic fin<-fcast$fcst$International ffuel<-fcast$fcst$Fuel ftsi <-fcast$fcst$TSI ff<-data.frame(fdo[,1],fin[,1],ffuel[,1],ftsi[,1]) #collecting the forecasts for 4 variables year<-st + (i-1) +1/12#starting year####### convert to a time series object ######## ff<-ts(ff,start=c(year,1),frequency =12) a =12*i-11#going from 1,13, 15, (1,1+12,1+12+12, 1+12+12+12, ...) b=12*i #12, 24, 36##### collecting errors ###### rmse1[c(a:b),] <-sqrt((ff-xtest)^2)######## Second Model ############ fit2 <- vars::VAR(xtrain, p=3, type='both') fcast2 <-predict(fit2, newdata=xtest, n.ahead =12) fdo<-fcast2$fcst$Domestic fin<-fcast2$fcst$International ffuel<-fcast2$fcst$Fuel ftsi <-fcast2$fcst$TSI ff2<-data.frame(fdo[,1],fin[,1],ffuel[,1],ftsi[,1]) year<-st + (i-1) +1/12 ff2<-ts(ff2,start=c(year,1),frequency =12) a =12*i-11 b=12*i rmse2[c(a:b),] <-sqrt((ff2-xtest)^2)}yr =rep(c(2008:2023),each =12) #yearqr =rep(paste0("M",1:12),16) #monthlyrmse1 =data.frame(yr,qr,rmse1)names(rmse1) =c("Year", "Month", "Domestic", "International", "Fuel", "TSI")rmse2 =data.frame(yr,qr,rmse2)names(rmse2) =c("Year", "Month", "Domestic", "International", "Fuel", "TSI")ggplot() +geom_line(data = rmse1, aes(x = Year, y = Domestic),color ="blue") +geom_line(data = rmse2, aes(x = Year, y = Domestic),color ="red") +labs(title ="CV RMSE for Domestic",x ="Date",y ="RMSE",guides(colour=guide_legend(title="Fit")))
Code
ggplot() +geom_line(data = rmse1, aes(x = Year, y = International),color ="blue") +geom_line(data = rmse2, aes(x = Year, y = International),color ="red") +labs(title ="CV RMSE for International",x ="Date",y ="RMSE",guides(colour=guide_legend(title="Fit")))
Code
ggplot() +geom_line(data = rmse1, aes(x = Year, y = Fuel),color ="blue") +geom_line(data = rmse2, aes(x = Year, y = Fuel),color ="red") +labs(title ="CV RMSE for Fuel",x ="Date",y ="RMSE",guides(colour=guide_legend(title="Fit")))
Code
ggplot() +geom_line(data = rmse1, aes(x = Year, y = TSI),color ="blue") +geom_line(data = rmse2, aes(x = Year, y = TSI),color ="red") +labs(title ="CV RMSE for TSI",x ="Date",y ="RMSE",guides(colour=guide_legend(title="Fit")))
Based on cross-validation results, it was found that the VAR(3) model exhibits a lower RMSE compared to the VAR(4) model. This lower RMSE indicates that the VAR(3) model provides better predictive performance and is therefore considered the superior model in this context.
Code
#emp_tsn=217k=48#12*4#n-k=168; 168/12=14;rmse1 <-matrix(NA, 168,3) #here is n-krmse2 <-matrix(NA, 168,3)rmse3 <-matrix(NA,14,12) #here n-k / 12 year<-c()ts_obj2 <-head(emp_ts, -1)st <-tsp(ts_obj2 )[1]+(k-1)/12for(i in1:14) #here is (n-k ) / 12{ xtrain <-window(ts_obj2, end=st + i-1) xtest <-window(ts_obj2, start=st + (i-1) +1/12, end=st + i)######## first Model ############ fit <- vars::VAR(xtrain, p=3, type='both') fcast <-predict(fit, newdata=xtest, n.ahead =12) fin<-fcast$fcst$International fdo<-fcast$fcst$Domestic femp<-fcast$fcst$Employment ff<-data.frame(fin[,1],fdo[,1],femp[,1]) #collecting the forecasts for 3 variables year<-st + (i-1) +1/12#starting year####### convert to a time series object ######## ff<-ts(ff,start=c(year,1),frequency =12) a =12*i-11#going from 1,13, 15, (1,1+12,1+12+12, 1+12+12+12, ...) b=12*i #12, 24, 36##### collecting errors ###### rmse1[c(a:b),] <-sqrt((ff-xtest)^2)######## Second Model ############ fit2 <- vars::VAR(xtrain, p=4, type='both') fcast2 <-predict(fit2, newdata=xtest, n.ahead =12) fin<-fcast2$fcst$International fdo<-fcast2$fcst$Domestic femp<-fcast2$fcst$Employment ff2<-data.frame(fin[,1],fdo[,1],femp[,1]) year<-st + (i-1) +1/12 ff2<-ts(ff2,start=c(year,1),frequency =12) a =12*i-11 b=12*i rmse2[c(a:b),] <-sqrt((ff2-xtest)^2)}yr =rep(c(2010:2023),each =12) #yearqr =rep(paste0("M",1:12),14) #monthlyrmse1 =data.frame(yr,qr,rmse1)names(rmse1) =c("Year", "Month","International","Domestic","Employment")rmse2 =data.frame(yr,qr,rmse2)names(rmse2) =c("Year", "Month","International","Domestic","Employment")ggplot() +geom_line(data = rmse1, aes(x = Year, y = International),color ="blue") +geom_line(data = rmse2, aes(x = Year, y = International),color ="red") +labs(title ="CV RMSE for International",x ="Date",y ="RMSE",guides(colour=guide_legend(title="Fit")))
Code
ggplot() +geom_line(data = rmse1, aes(x = Year, y = Domestic),color ="blue") +geom_line(data = rmse2, aes(x = Year, y = Domestic),color ="red") +labs(title ="CV RMSE for Domestic",x ="Date",y ="RMSE",guides(colour=guide_legend(title="Fit")))
Code
ggplot() +geom_line(data = rmse1, aes(x = Year, y = Employment),color ="blue") +geom_line(data = rmse2, aes(x = Year, y = Employment),color ="red") +labs(title ="CV RMSE for Employment",x ="Date",y ="RMSE",guides(colour=guide_legend(title="Fit")))
Based on cross-validation results, it was found that the VAR(3) model exhibits a lower RMSE compared to the VAR(4) model. This lower RMSE indicates that the VAR(4) model provides better predictive performance and is therefore considered the superior model in this context.
Code
#air_tsn=10#372k=1#12*4#n-k=9; rmse1 <-matrix(NA, 9,3) #here is n-krmse2 <-matrix(NA, 9,3)rmse3 <-matrix(NA,9) #here n-kyear<-c()ts_obj <- air_tsst <-tsp(ts_obj )[1] for(i in1:9) #here is (n-k ) { xtrain <-window(ts_obj, end=st + i-1) xtest <-window(ts_obj, start=st + (i-1) +1, end=st + i)######## first Model ############ fit <- vars::VAR(ts_obj, p=1, type='both') fcast <-predict(fit, n.ahead =3) fgdp<-fcast$fcst$GDP fre<-fcast$fcst$Revenue femp<-fcast$fcst$Employment ff<-data.frame(fgdp[,1],fre[,1],femp[,1]) #collecting the forecasts for 3 variables year<-st + (i-1) #starting year####### convert to a time series object ######## ff<-ts(ff,start=c(year,1),frequency =1)##### collecting errors ###### rmse1[i-1,] <-sqrt((ff-xtest)^2)######## Second Model ############ fit2 <- vars::VAR(ts_obj, p=2, type='both') fcast2 <-predict(fit2, n.ahead =3) fgdp<-fcast2$fcst$GDP fre<-fcast2$fcst$Revenue femp<-fcast2$fcst$Employment ff2<-data.frame(fgdp[,1],fre[,1],femp[,1]) year<-st + (i-1) ff2<-ts(ff2,start=c(year,1),frequency =1) rmse2[i-1,] <-sqrt((ff2-xtest)^2)}yr =rep(c(2013:2021),each =1) #yearrmse1 =data.frame(yr,rmse1)names(rmse1) =c("Year", "GDP","Revenue","Employment")rmse2 =data.frame(yr,rmse2)names(rmse2) =c("Year", "GDP","Revenue","Employment")ggplot() +geom_line(data = rmse1, aes(x = Year, y = GDP),color ="blue") +geom_line(data = rmse2, aes(x = Year, y = GDP),color ="red") +labs(title ="CV RMSE for GDP",x ="Date",y ="RMSE",guides(colour=guide_legend(title="Fit")))
Code
ggplot() +geom_line(data = rmse1, aes(x = Year, y = Revenue),color ="blue") +geom_line(data = rmse2, aes(x = Year, y = Revenue),color ="red") +labs(title ="CV RMSE for Revenue",x ="Date",y ="RMSE",guides(colour=guide_legend(title="Fit")))
Code
ggplot() +geom_line(data = rmse1, aes(x = Year, y = Employment),color ="blue") +geom_line(data = rmse2, aes(x = Year, y = Employment),color ="red") +labs(title ="CV RMSE for Employment",x ="Date",y ="RMSE",guides(colour=guide_legend(title="Fit")))
Based on cross-validation results, it was found that the VAR(1) model exhibits a lower RMSE compared to the VAR(2) model. This lower RMSE indicates that the VAR(1) model provides better predictive performance and is therefore considered the superior model in this context.
The forecast of the variables appears to capture the trend of the original data adequately. While the fuel trend shows a slight decrease, the trends in other variables appear relatively stable or flat. Furthermore, the model seems to struggle in capturing the seasonality present in the data.
The most optimal model I fitted is the VAR(4) model.
The forecast of the variables appears to capture the trend of the original data adequately. While the employment trend shows a decreasing trend, the trends in other variables exhibit fluctuations, with an initial increase followed by a subsequent drop. However, it seems that the model struggles to capture the seasonality present in the data.
The most optimal model I fitted is the VAR(1) model.
The forecast of the variables does not appear to capture the trend of the original data adequately. All the variables exhibit fluctuations, characterized by an initial increase followed by a subsequent drop, which repeats over time. Additionally, the model diagnostics show no significant correlation among the variables, suggesting that the model performance may not be optimal.
Findings
The first model appears to capture the trend of the original data adequately. While the fuel trend shows a slight decrease, the trends in other variables appear relatively stable or flat. Furthermore, the model seems to struggle in capturing the seasonality present in the data.
The second model appears to capture the trend of the original data adequately. While the employment trend shows a decreasing trend, the trends in other variables exhibit fluctuations, with an initial increase followed by a subsequent drop. However, it seems that the model struggles to capture the seasonality present in the data.
The third model does not appear to capture the trend of the original data adequately. All the variables exhibit fluctuations, characterized by an initial increase followed by a subsequent drop, which repeats over time. Additionally, the model diagnostics show no significant correlation among the variables, suggesting that the model performance may not be optimal.